Elbowplot

Identifying the true dimensionality of a dataset and the most significant PC can be challenging/uncertain. Elbowplot method generates a ranking of principle components based on the percentage of variance explained by each one. In this example, we can observe an elbow (i.e. beginning of the straight line) somewhere between PC 20-25, suggesting that the majority of true signal is captured in the first 21 PCs.

Third clustering

Cell count and contribution of each cluster in naive and inflamed population

## 
##    0    1    2    3    4    5    6    7    8    9 
## 7283 4410 3335 3044 2962 2231 1602  380  281  237

##    expt.type    0    1    2    3    4    5    6   7   8   9
## 1:     Naive 2738 1410  866  771 1251  714   66 147 146  48
## 2:  Inflamed 4545 3000 2469 2273 1711 1517 1536 233 135 189
## [1] "Naive-cell count"
## [1] 8157
## [1] "Inflamed-cell count"
## [1] 17608

## # A tibble: 20 × 4
## # Groups:   Condition [2]
##    Cluster_ID Condition Count Percent
##    <fct>      <fct>     <dbl>   <dbl>
##  1 0          Naive      2738  33.6  
##  2 1          Naive      1410  17.3  
##  3 2          Naive       866  10.6  
##  4 3          Naive       771   9.45 
##  5 4          Naive      1251  15.3  
##  6 5          Naive       714   8.75 
##  7 6          Naive        66   0.809
##  8 7          Naive       147   1.80 
##  9 8          Naive       146   1.79 
## 10 9          Naive        48   0.588
## 11 0          Inflamed   4545  25.8  
## 12 1          Inflamed   3000  17.0  
## 13 2          Inflamed   2469  14.0  
## 14 3          Inflamed   2273  12.9  
## 15 4          Inflamed   1711   9.72 
## 16 5          Inflamed   1517   8.62 
## 17 6          Inflamed   1536   8.72 
## 18 7          Inflamed    233   1.32 
## 19 8          Inflamed    135   0.767
## 20 9          Inflamed    189   1.07

DEG identification

Heat map of top 10 differentially expressed genes

Here, Inflamed is the treatment group and Naive is taken as the control group.Comaprison starts with C0 cells in Inflamed with C0-C9 cells in Naive and ends with C9 cells in Inflamed with C0-C9 cells in Naive. Top and bottom 10 values are selected based on the absolute values of p_val_adj and avg_log2FC.

Clustree

Clustering is a core tool for analysing single-cell RNA-sequencing (scRNA-seq) datasets. The clustering is primarily controlled by two parameters, number of principle components and then resolution. A clustering tree visualises the relationships between at a range of resolutions.